On the asymptotics of random forests

نویسنده

  • Erwan Scornet
چکیده

The last decade has witnessed a growing interest in random forest models which are recognized to exhibit good practical performance, especially in high-dimensional settings. On the theoretical side, however, their predictive power remains largely unexplained, thereby creating a gap between theory and practice. The aim of this paper is twofold. Firstly, we provide theoretical guarantees to link finite forests used in practice (with a finite number M of trees) to their asymptotic counterparts (with M = ∞). Using empirical process theory, we prove a uniform central limit theorem for a large class of random forest estimates, which holds in particular for Breiman’s original forests. Secondly, we show that infinite forest consistency implies finite forest consistency and thus, we state the consistency of several infinite forests. In particular, we prove that q quantile forests— close in spirit to Breiman’s forests but easier to study—are able to combine inconsistent trees to obtain a final consistent prediction, thus highlighting the benefits of random forests compared to single trees. Index Terms — Random forests, randomization, consistency, central limit theorem, empirical process, number of trees, q-quantile. 2010 Mathematics Subject Classification: 62G05, 62G20.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymptotic Cost of Cutting Down Random Free Trees

In this work, we calculate the limit distribution of the total cost incurred by splitting a tree selected at random from the set of all finite free trees. This total cost is considered to be an additive functional induced by a toll equal to the square of the size of tree. The main tools used are the recent results connecting the asymptotics of generating functions with the asymptotics of...

متن کامل

Random forests algorithm in podiform chromite prospectivity mapping in Dolatabad area, SE Iran

The Dolatabad area located in SE Iran is a well-endowed terrain owning several chromite mineralized zones. These chromite ore bodies are all hosted in a colored mélange complex zone comprising harzburgite, dunite, and pyroxenite. These deposits are irregular in shape, and are distributed as small lenses along colored mélange zones. The area has a great potential for discovering further chromite...

متن کامل

Fault Locating in High Voltage Transmission Lines Based on Harmonic Components of One-end Voltage Using Random Forests

In this paper, an approach is proposed for accurate locating of single phase faults in transmission lines using voltage signals measured at one-end. In this method, harmonic components of the voltage signals are extracted through Discrete Fourier Transform (DFT) and are normalized by a transformation. The proposed fault locator, which is designed based on Random Forests (RF) algorithm, is train...

متن کامل

A Study on the Accuracy and Precision of Estimation of the Number, Basal Area and Standing Trees Volume per Hectare Using of some Sampling Methods in Forests of NavAsalem

   The present study aimed to investigate the accuracy and precision estimation of the number, basal area and volume of the standing trees by methods of random and systematic random sampling in the forests of West Guilan. The cost or inventory time was determined using the criteria (E%2 × T). Inventory was carried out by complete sampling (census) in an area of 52 hectares. The study area (sect...

متن کامل

Asymptotics for the infinite time ruin probability of a dependent risk model with a constant interest rate and dominatedly varying-tailed claim sizes

 This paper mainly considers a nonstandard risk model with a constant interest rate‎, ‎where both the claim sizes and the inter-arrival times follow some certain dependence structures‎. ‎When the claim sizes are dominatedly varying-tailed‎, ‎asymptotics for the infinite time ruin probability of the above dependent risk model have been given‎.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Multivariate Analysis

دوره 146  شماره 

صفحات  -

تاریخ انتشار 2016